Data Description

This dataset is from the UCI Machine Learning Repository and can be found at: https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume#. The dataset contains information on the hourly traffic volume for DOT ATR station 301 on the westbound I-94 from Minneapolis to Saint Paul, Minnesota. The original data is from 2012-2018 and includes the following variables:

Variable Type Details
date_time date/time hour of the data collected in local CST time
holiday categorical show US National holidays and regional holidays
temp numeric average temperature in kelvin
rain_1h numeric amount in mm of rain that occurred in the hour
snow_1h numeric amount in mm of snow that occurred in the hour
clouds_all numeric percentage of cloud cover
weather_main categorical short textual description of the current weather
weather_description categorical longer textual description of the current weather
traffic_volume numeric hourly I-94 ATR 301 reported westbound traffic volume

For the purpose of this time series analysis, we will remove the categorical variables, and focus on the date/time and numeric variables. Since there are a few numeric variables, we have also opted to remove snow_1h as the data is largely missing for this variable. The following list contains the variables used in this analysis:

Variable Type Details
date_time date/time hour of the data collected in local CST time
temp numeric average temperature in kelvin
rain_1h numeric amount in mm of rain that occurred in the hour
clouds_all numeric percentage of cloud cover
traffic_volume numeric hourly I-94 ATR 301 reported westbound traffic volume

The response is the reported traffic volume, specficially, the total traffice volume per day, based on the date as well as temperature and weather factors.

Problem Statement

The Minnesota state government has recevied mounting complaints over the past few years regarding the volume of traffic on the I-94 West from Minneapolis to St. Paul. Frustrated commuters have been voicing their concerns over the past few years, and at this time, the government is looking into expanding the lanes on the highway. Our firm has been enlisted to analyze the data collected on the weather and traffic volume, and make a determination if expansion is necessary. The government is looking for a consistent excess of traffic for the expansion, and plan on implementing other controls if it appears to vary seasonally or display other patterns.

With the vote approaching in the next few days, the government has requested we focus our attention on the most recent data and provide insights on the outlook of traffic volume for the next couple of weeks. Some involved in the vote only seem concerned with the current situation and we are tasked with determining if there is an imminent need.

knitr::opts_chunk$set(echo = TRUE)

#load libraries needed for analysis
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tswge)
library(vars)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: strucchange
## Warning: package 'strucchange' was built under R version 3.5.2
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 3.5.2
## Loading required package: urca
## Loading required package: lmtest
library(nnfor)
## Warning: package 'nnfor' was built under R version 3.5.2
## Loading required package: forecast
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ readr   1.1.1
## ✔ tibble  2.1.3     ✔ purrr   0.2.5
## ✔ tidyr   1.0.0     ✔ stringr 1.3.1
## ✔ ggplot2 3.0.0     ✔ forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## ── Conflicts ───────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ stringr::boundary() masks strucchange::boundary()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ MASS::select()      masks dplyr::select()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Data Preparation

This section focuses on the alerations made to the date to get it ready for the analysis. The following steps were taken:

  1. Removed the extra variables: holiday, snow_1h, weather_main, weather_description
  2. Removed duplicated records due to various repetitions present in the data
  3. Extracted the date from the date_time variable to use in aggregating the data to create variable date
  4. Removed the date_time variable, since data will be aggregated for each day
  5. Aggregated the remaining variables by date which involved using the hourly data to create daily data
    • temp changed to avg_temp: average temperature for each day in kelvin
    • rain_1h to avg_rain: average amount of mm of rain that occurred each day
    • clouds_all to avg_clouds: average percent of cloud cover for each day
    • traffic_volume to tot_traffic_vol: sum of reported westbound traffic volume for each day
  6. Filtered the data to focus on traffic volume data for 2018
#load Minnesota metro traffic volume dataset
#metro = read.csv(file="/Users/anjli/Documents/Metro_Interstate_Traffic_Volume.csv", header=TRUE, sep=",")
metro = read.csv("/Users/swee/Desktop/SMU/06_Fall 2019/ MSDS 6373 - Time Series/Project/Metro_Interstate_Traffic_Volume.csv", header=T)
head(metro)
##   holiday   temp rain_1h snow_1h clouds_all weather_main
## 1    None 288.28       0       0         40       Clouds
## 2    None 289.36       0       0         75       Clouds
## 3    None 289.58       0       0         90       Clouds
## 4    None 290.13       0       0         90       Clouds
## 5    None 291.14       0       0         75       Clouds
## 6    None 291.72       0       0          1        Clear
##   weather_description           date_time traffic_volume
## 1    scattered clouds 2012-10-02 09:00:00           5545
## 2       broken clouds 2012-10-02 10:00:00           4516
## 3     overcast clouds 2012-10-02 11:00:00           4767
## 4     overcast clouds 2012-10-02 12:00:00           5026
## 5       broken clouds 2012-10-02 13:00:00           4918
## 6        sky is clear 2012-10-02 14:00:00           5181
#remove extra variables not used in analysis
metro1 = metro[,-c(1,4,6,7)]
#verify 
head(metro1)
##     temp rain_1h clouds_all           date_time traffic_volume
## 1 288.28       0         40 2012-10-02 09:00:00           5545
## 2 289.36       0         75 2012-10-02 10:00:00           4516
## 3 289.58       0         90 2012-10-02 11:00:00           4767
## 4 290.13       0         90 2012-10-02 12:00:00           5026
## 5 291.14       0         75 2012-10-02 13:00:00           4918
## 6 291.72       0          1 2012-10-02 14:00:00           5181
dim(metro1)
## [1] 48204     5
#remove duplicated records
metro2 = metro1 %>% distinct(date_time, .keep_all = TRUE)
dim(metro2)
## [1] 40575     5
#extract Date from date_time variable
metro2$date <- as.Date(metro2$date_time)
dim(metro2)
## [1] 40575     6
#remove data_time variable
metro3 = metro2[c(-4)]
head(metro3)
##     temp rain_1h clouds_all traffic_volume       date
## 1 288.28       0         40           5545 2012-10-02
## 2 289.36       0         75           4516 2012-10-02
## 3 289.58       0         90           4767 2012-10-02
## 4 290.13       0         90           5026 2012-10-02
## 5 291.14       0         75           4918 2012-10-02
## 6 291.72       0          1           5181 2012-10-02
dim(metro3)
## [1] 40575     5
#aggregrate hourly data by date
data1 <- data.frame(metro3 %>% group_by(date) %>% summarise(avg_temp = mean(temp)))
data2 <- data.frame(metro3 %>% group_by(date) %>% summarise(avg_rain = mean(rain_1h)))
data3 <- data.frame(metro3 %>% group_by(date) %>% summarise(avg_clouds = mean(clouds_all)))
data4 <- data.frame(metro3 %>% group_by(date) %>% summarise(tot_traffic_vol = sum(traffic_volume)))

metro4 <- merge(data1, data2, by.x="date", by.y="date", no.dups)
metro4 <- merge(metro4, data3, by.x="date", by.y="date", no.dups)
metro4 <- merge(metro4, data4, by.x="date", by.y="date", no.dups)
head(metro4)
##         date avg_temp avg_rain avg_clouds tot_traffic_vol
## 1 2012-10-02 290.4033        0   29.13333           63289
## 2 2012-10-03 286.4135        0    3.85000           66345
## 3 2012-10-04 289.3575        0   16.70833           89939
## 4 2012-10-05 282.0782        0   75.00000           93336
## 5 2012-10-06 277.7461        0   61.65217           74910
## 6 2012-10-07 276.7395        0   82.04545           67290
dim(metro4)
## [1] 1860    5
#extract year information from date variable
metro4$year <- year(metro4$date)
head(metro4)
##         date avg_temp avg_rain avg_clouds tot_traffic_vol year
## 1 2012-10-02 290.4033        0   29.13333           63289 2012
## 2 2012-10-03 286.4135        0    3.85000           66345 2012
## 3 2012-10-04 289.3575        0   16.70833           89939 2012
## 4 2012-10-05 282.0782        0   75.00000           93336 2012
## 5 2012-10-06 277.7461        0   61.65217           74910 2012
## 6 2012-10-07 276.7395        0   82.04545           67290 2012
#trim the dataset to only include 2018 data
metro_2018 = metro4 %>% filter(year == 2018)
dim(metro_2018)
## [1] 273   6
#plots of 2018 aggregated metro data - ready for analysis
plotts.wge(metro_2018$tot_traffic_vol)

Models and Forecasting

The following models were fitted and forecasted for the analysis: ARUMA, VAR, Neural Network, and Ensemble. Multiple models were fitted for each method.

Our firm utilized each model performed below to develop forecasts and draw insights based on the data. With the upcoming vote, and focus placed on the present situation, we determined that a forecast horizon of 2 weeks/14 days should be used. The goal of using this horizon was to share the most relevant information based on our analysis to address the issue at hand.

ARUMA Models (Univariate Analysis)

We can see the seasonal behavior in the time series, therefore, we will fit ARUMA model(s) to forecast the traffic volume.

plotts.sample.wge(metro_2018$tot_traffic_vol)

## $autplt
##  [1]  1.0000000  0.3895276 -0.2217280 -0.3538884 -0.3607343 -0.2364733
##  [7]  0.2767563  0.6645843  0.2625422 -0.2452426 -0.3263143 -0.3346122
## [13] -0.2215449  0.2820379  0.6746746  0.2678139 -0.2393722 -0.3595937
## [19] -0.3749914 -0.2412569  0.2749591  0.6574318  0.2309840 -0.2688066
## [25] -0.3477532 -0.3528249
## 
## $freq
##   [1] 0.003663004 0.007326007 0.010989011 0.014652015 0.018315018
##   [6] 0.021978022 0.025641026 0.029304029 0.032967033 0.036630037
##  [11] 0.040293040 0.043956044 0.047619048 0.051282051 0.054945055
##  [16] 0.058608059 0.062271062 0.065934066 0.069597070 0.073260073
##  [21] 0.076923077 0.080586081 0.084249084 0.087912088 0.091575092
##  [26] 0.095238095 0.098901099 0.102564103 0.106227106 0.109890110
##  [31] 0.113553114 0.117216117 0.120879121 0.124542125 0.128205128
##  [36] 0.131868132 0.135531136 0.139194139 0.142857143 0.146520147
##  [41] 0.150183150 0.153846154 0.157509158 0.161172161 0.164835165
##  [46] 0.168498168 0.172161172 0.175824176 0.179487179 0.183150183
##  [51] 0.186813187 0.190476190 0.194139194 0.197802198 0.201465201
##  [56] 0.205128205 0.208791209 0.212454212 0.216117216 0.219780220
##  [61] 0.223443223 0.227106227 0.230769231 0.234432234 0.238095238
##  [66] 0.241758242 0.245421245 0.249084249 0.252747253 0.256410256
##  [71] 0.260073260 0.263736264 0.267399267 0.271062271 0.274725275
##  [76] 0.278388278 0.282051282 0.285714286 0.289377289 0.293040293
##  [81] 0.296703297 0.300366300 0.304029304 0.307692308 0.311355311
##  [86] 0.315018315 0.318681319 0.322344322 0.326007326 0.329670330
##  [91] 0.333333333 0.336996337 0.340659341 0.344322344 0.347985348
##  [96] 0.351648352 0.355311355 0.358974359 0.362637363 0.366300366
## [101] 0.369963370 0.373626374 0.377289377 0.380952381 0.384615385
## [106] 0.388278388 0.391941392 0.395604396 0.399267399 0.402930403
## [111] 0.406593407 0.410256410 0.413919414 0.417582418 0.421245421
## [116] 0.424908425 0.428571429 0.432234432 0.435897436 0.439560440
## [121] 0.443223443 0.446886447 0.450549451 0.454212454 0.457875458
## [126] 0.461538462 0.465201465 0.468864469 0.472527473 0.476190476
## [131] 0.479853480 0.483516484 0.487179487 0.490842491 0.494505495
## [136] 0.498168498
## 
## $db
##   [1]   0.70463842  -1.47914926  -0.19005867  -9.66775577 -13.45743158
##   [6]   3.70440533  -0.02275162  -5.80579282  -4.34440971  -5.23497602
##  [11] -14.03031355 -12.44117811  -0.60306958  -1.02750728  -2.29128720
##  [16]  -7.49542607  -2.94765219  -4.65718681  -2.86670716   0.17803140
##  [21]   1.90526369  -3.55157908  -5.38523042   0.57887408 -27.49930078
##  [26]  -2.59886522   2.61092615  -5.70292950 -11.43479766  -1.23399959
##  [31]  -6.92954070  -6.67825593  -2.75612632  -2.49549818 -15.92216153
##  [36]  -7.55133662  -3.25185280 -14.99151227  18.54562565   2.37602144
##  [41] -12.54650468 -15.94199621  -9.06857864  -4.99198894  -6.54149033
##  [46] -12.05734944  -4.77901956  -3.51160840  -8.08723451   2.76667117
##  [51] -10.06636616  -6.86246044  -6.74282357  -5.98107902  -3.30136377
##  [56]  -1.41993354  -3.80614167  -4.87590994  -7.78378907  -8.93789332
##  [61]  -4.20365190 -11.24856221 -23.70268451 -13.16335591  -5.07308369
##  [66]  -6.72953948  -1.92486931 -21.30824272  -2.89985036  -9.45792609
##  [71]  -7.23275624  -7.63364102 -13.55375380  -2.25055384 -17.51091961
##  [76]  -7.66470737  -4.09415679  13.25154092  -0.27144927  -5.83392429
##  [81]  -6.59539553 -13.06228291 -10.91223736  -8.21424881 -21.58194663
##  [86] -16.30135867 -10.30216728  -9.32225820  -1.99158139  -8.99707729
##  [91] -10.24989636 -18.21808417  -8.28808478 -10.70282215  -5.11005742
##  [96] -14.74243292 -12.64252766  -4.15892430 -11.87146587  -5.14521951
## [101]  -8.49292297 -11.78154219 -15.82241080  -1.88123910 -15.00817403
## [106] -12.35870109  -9.16891164  -9.14777219 -12.10238699 -13.94249595
## [111]  -3.18911205 -12.02880733  -7.54140651 -15.70399661 -14.33051350
## [116] -15.94555293   2.48780932 -15.69976938  -5.16270546 -12.47577822
## [121] -17.33530526 -13.13225909  -8.94795428 -13.77178285  -9.54630517
## [126]  -9.95481802 -37.79190467 -12.25366396  -5.99011805  -6.28140482
## [131] -10.94458674 -11.65116575 -16.64876207  -5.57644763  -4.86377600
## [136] -17.46716492
## 
## $dbz
##   [1] -1.38680114 -1.43039393 -1.50450531 -1.61061490 -1.74935061
##   [6] -1.91903912 -2.11400368 -2.32301373 -2.52863970 -2.70855498
##  [11] -2.83960158 -2.90415173 -2.89623172 -2.82386563 -2.70594643
##  [16] -2.56548026 -2.42297130 -2.29276729 -2.18302889 -2.09830583
##  [21] -2.04276200 -2.02162392 -2.03842974 -2.08618660 -2.13189319
##  [26] -2.09859365 -1.86267129 -1.29835340 -0.36590956  0.84620604
##  [31]  2.18197981  3.50133892  4.71286465  5.76707315  6.64054567
##  [36]  7.32369347  7.81346976  8.10945337  8.21185338  8.12055879
##  [41]  7.83479384  7.35322554  6.67459525  5.79920727  4.73200583
##  [46]  3.48857500  2.10593888  0.65879630 -0.72646623 -1.88724521
##  [51] -2.69942196 -3.16509506 -3.40345674 -3.55613304 -3.71822824
##  [56] -3.92790072 -4.18250134 -4.45786284 -4.72486854 -4.96203602
##  [61] -5.16154604 -5.32629545 -5.45867955 -5.54499216 -5.54147991
##  [66] -5.37237625 -4.95444018 -4.24744494 -3.29098142 -2.18696256
##  [71] -1.04912095  0.03248877  0.99914751  1.81677612  2.46700557
##  [76]  2.94040807  3.23224060  3.34000260  3.26214926  2.99755849
##  [81]  2.54559088  1.90679871  1.08457940  0.08837439 -1.06066462
##  [86] -2.32073378 -3.61845061 -4.84416062 -5.87167117 -6.61210320
##  [91] -7.06431622 -7.30367549 -7.42212680 -7.48251372 -7.51055739
##  [96] -7.50873023 -7.47411674 -7.41064243 -7.33226512 -7.25871964
## [101] -7.20833892 -7.19194147 -7.20940716 -7.24889565 -7.28844716
## [106] -7.30013760 -7.25685956 -7.14047659 -6.94827515 -6.69447670
## [111] -6.40619205 -6.11644432 -5.85774987 -5.65814966 -5.53964567
## [116] -5.51807815 -5.60346417 -5.80012142 -6.10616816 -6.51216851
## [121] -6.99890828 -7.53480102 -8.07453230 -8.56208476 -8.94148038
## [126] -9.17417500 -9.25375426 -9.20648314 -9.07718702 -8.91157061
## [131] -8.74494096 -8.59904854 -8.48393351 -8.40159519 -8.34974942
## [136] -8.32497686
#the spectral density plot of the time series shows similar system frequencies as the ones from the s=7 factor table 
factor.wge(phi=c(rep(0,6),1))
## 
## Coefficients of Original polynomial:  
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0000B              1.0000               1.0000       0.0000
## 1+0.4450B+1.0000B^2   -0.2225+-0.9749i      1.0000       0.2857
## 1-1.2470B+1.0000B^2    0.6235+-0.7818i      1.0000       0.1429
## 1+1.8019B+1.0000B^2   -0.9010+-0.4339i      1.0000       0.4286
##   
## 
#remove the seasonal component (s=7) from the data 
metro_use = metro_2018$tot_traffic_vol
diff = artrans.wge(metro_use, phi.tr = c(rep(0,6),1))

aic5.wge(diff, p=0:13, q=0:3)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 13 3 
## Five Smallest Values of  aic
##       p    q        aic
## 54   13    1   17.98748
## 53   13    0   18.00573
## 30    7    1   18.00631
## 33    8    0   18.00998
## 31    7    2   18.01124
aic5.wge(diff, p=0:13, q=0:3, type = "bic")
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 13 3 
## Five Smallest Values of  bic
##       p    q        bic
## 30    7    1   18.12756
## 33    8    0   18.13123
## 31    7    2   18.14596
## 34    8    1   18.14754
## 37    9    0   18.14979
#tried fitting different models from the aic5 and ARMA(13,0) and ARMA(8,0) give the best results
#First ARUMA model: start with ARMA(13,0)
#estimate and transform data (use mle estimates)
est.temp = est.arma.wge(diff, p=13, q=0)
## 
## Coefficients of Original polynomial:  
## 0.4433 -0.0696 -0.0169 -0.0091 0.0304 -0.0782 -0.5382 0.2339 -0.0794 0.0548 0.0482 0.0346 -0.1625 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1+0.9468B             -1.0562               0.9468       0.5000
## 1+1.1408B+0.8643B^2   -0.6600+-0.8494i      0.9297       0.3551
## 1-0.5934B+0.8617B^2    0.3443+-1.0207i      0.9283       0.1982
## 1-1.5630B+0.8263B^2    0.9457+-0.5619i      0.9090       0.0853
## 1-1.5971B+0.7181B^2    1.1121+-0.3948i      0.8474       0.0543
## 1-0.0276B+0.7002B^2    0.0197+-1.1949i      0.8368       0.2474
## 1+1.2503B+0.5546B^2   -1.1273+-0.7297i      0.7447       0.4086
##   
## 
diff1 = artrans.wge(diff, phi.tr = est.temp$phi)

#the acf shows that three of the lags are over the 95% limits line
plotts.sample.wge(diff1, arlimits=TRUE)

## $autplt
##  [1]  1.000000e+00 -5.012752e-02  2.088243e-02  4.105399e-02  1.482657e-02
##  [6] -2.030161e-02  5.855015e-02 -1.750474e-01  1.530251e-02  2.464260e-02
## [11] -1.316719e-02 -2.702766e-02  9.380294e-03  2.541205e-02 -2.901018e-01
## [16]  7.421219e-02 -6.477166e-03 -1.250824e-02 -2.721028e-05  4.033131e-02
## [21] -1.562574e-01 -1.218534e-02 -6.727750e-02 -7.826634e-02  2.986900e-02
## [26] -2.541235e-02
## 
## $freq
##   [1] 0.003952569 0.007905138 0.011857708 0.015810277 0.019762846
##   [6] 0.023715415 0.027667984 0.031620553 0.035573123 0.039525692
##  [11] 0.043478261 0.047430830 0.051383399 0.055335968 0.059288538
##  [16] 0.063241107 0.067193676 0.071146245 0.075098814 0.079051383
##  [21] 0.083003953 0.086956522 0.090909091 0.094861660 0.098814229
##  [26] 0.102766798 0.106719368 0.110671937 0.114624506 0.118577075
##  [31] 0.122529644 0.126482213 0.130434783 0.134387352 0.138339921
##  [36] 0.142292490 0.146245059 0.150197628 0.154150198 0.158102767
##  [41] 0.162055336 0.166007905 0.169960474 0.173913043 0.177865613
##  [46] 0.181818182 0.185770751 0.189723320 0.193675889 0.197628458
##  [51] 0.201581028 0.205533597 0.209486166 0.213438735 0.217391304
##  [56] 0.221343874 0.225296443 0.229249012 0.233201581 0.237154150
##  [61] 0.241106719 0.245059289 0.249011858 0.252964427 0.256916996
##  [66] 0.260869565 0.264822134 0.268774704 0.272727273 0.276679842
##  [71] 0.280632411 0.284584980 0.288537549 0.292490119 0.296442688
##  [76] 0.300395257 0.304347826 0.308300395 0.312252964 0.316205534
##  [81] 0.320158103 0.324110672 0.328063241 0.332015810 0.335968379
##  [86] 0.339920949 0.343873518 0.347826087 0.351778656 0.355731225
##  [91] 0.359683794 0.363636364 0.367588933 0.371541502 0.375494071
##  [96] 0.379446640 0.383399209 0.387351779 0.391304348 0.395256917
## [101] 0.399209486 0.403162055 0.407114625 0.411067194 0.415019763
## [106] 0.418972332 0.422924901 0.426877470 0.430830040 0.434782609
## [111] 0.438735178 0.442687747 0.446640316 0.450592885 0.454545455
## [116] 0.458498024 0.462450593 0.466403162 0.470355731 0.474308300
## [121] 0.478260870 0.482213439 0.486166008 0.490118577 0.494071146
## [126] 0.498023715
## 
## $db
##   [1]  -6.55035841 -24.97281865  -4.16224071 -18.06129527   1.95140383
##   [6]   6.79663545  -4.34373463   0.24536487   2.06703897 -15.83065991
##  [11]  -8.30717540   3.05139006   3.47872310  -0.51378559  -4.05230595
##  [16]  -2.48086271  -4.35188696   3.81903338   3.04156837  -8.29072586
##  [21]  -4.05116153  -1.01610878  -6.63640244  -2.34341059   4.81268172
##  [26]  -3.23425910  -5.13705507   4.08328692  -1.91379504  -5.23946517
##  [31]   4.56365149  -7.55902710  -4.76847374  -4.57267952 -27.81733699
##  [36] -21.79033794  -7.95983856 -10.44482915 -10.27930311  -4.99252911
##  [41]   2.05376922  -3.55936647   3.20297173   1.24496868  -1.74731715
##  [46]   3.45094476   0.86676885 -15.77312607  -5.06728160   1.71181683
##  [51]  -1.68231870  -1.52818515   1.75454650   1.13129759 -18.66101466
##  [56]  -3.36604518   3.96269891 -10.57719532  -3.45651080   1.66711186
##  [61]   0.93527664   4.60558202  -6.72500916   4.79697956  -2.46307481
##  [66]  -3.28857217  -0.98844106   0.65448640  -1.96646447 -15.31193529
##  [71]  -7.46152643  -9.66367523  -6.79759749  -5.86011979   0.47517291
##  [76] -21.97607684 -10.76613860   1.23105266  -0.51192154  -5.43097798
##  [81]   0.04673798   7.87462508   0.85494164   5.62541507  -6.09320386
##  [86]   2.00857316  -0.55332340   1.21466777  -5.24413805  -1.31036210
##  [91]  -2.33435677 -14.73627832   2.36512947   1.53920419  -5.26502136
##  [96]   7.56697315   0.37427165  -8.47773730  -1.17754633  -7.77651150
## [101]   1.54768084 -25.96970619   6.35364566  -2.54214334 -10.77865252
## [106] -10.44008212 -18.70691481 -20.36641209 -22.05943646  -9.05684123
## [111]  -1.93354880  -5.50604083  -7.36888346   0.42978066   0.87951638
## [116]   3.90880874  -1.50674700  -4.27523688   5.48608866  -4.97786069
## [121]   7.42226139  -1.86622136  -6.78031830   3.17146153  -6.41065334
## [126]  -3.96149904
## 
## $dbz
##   [1] -1.5333743891 -1.2658255762 -0.8938102253 -0.4901371984 -0.1131194900
##   [6]  0.2005455025  0.4335849194  0.5831956657  0.6563673502  0.6660864400
##  [11]  0.6281729885  0.5585198806  0.4708358888  0.3752748626  0.2783141083
##  [16]  0.1838746880  0.0951631669  0.0163701040 -0.0466347082 -0.0871214883
##  [21] -0.0997405598 -0.0833910515 -0.0434685328  0.0075831375  0.0516665564
##  [26]  0.0677073189  0.0345865296 -0.0663832669 -0.2495407264 -0.5231011677
##  [31] -0.8868202607 -1.3283477230 -1.8179542747 -2.3029971711 -2.7074127346
##  [36] -2.9458091331 -2.9564707164 -2.7350555700 -2.3372870687 -1.8477213909
##  [41] -1.3449309399 -0.8847633739 -0.4995331637 -0.2034252154  0.0018553508
##  [46]  0.1229672608  0.1722438344  0.1658529696  0.1222768497  0.0607729243
##  [51] -0.0002954528 -0.0451561120 -0.0621130204 -0.0445745651  0.0085056616
##  [56]  0.0927646575  0.1990485596  0.3144022069  0.4233114314  0.5090542270
##  [61]  0.5549991438  0.5457203330  0.4678861265  0.3109977221  0.0681844487
##  [66] -0.2625963139 -0.6764427888 -1.1578290738 -1.6749043985 -2.1727162373
##  [71] -2.5703861386 -2.7731926163 -2.7077636590 -2.3634951725 -1.8010547780
##  [76] -1.1183958680 -0.4094435764  0.2561948106  0.8342619569  1.2995789541
##  [81]  1.6397697673  1.8511377730  1.9366692585  1.9056523999  1.7743880863
##  [86]  1.5673671329  1.3178842329  1.0664457134  0.8551946876  0.7182055882
##  [91]  0.6711745909  0.7066429222  0.7981679709  0.9104604748  1.0093015248
##  [96]  1.0673488755  1.0656969584  0.9928752910  0.8429295116  0.6136070502
## [101]  0.3052017512 -0.0795685834 -0.5342616717 -1.0455219757 -1.5876650895
## [106] -2.1156986751 -2.5605628238 -2.8356843899 -2.8644243552 -2.6199763759
## [111] -2.1439172993 -1.5233290453 -0.8507446638 -0.1989196232  0.3842379561
## [116]  0.8707901509  1.2463851966  1.5055849842  1.6493301672  1.6841564117
## [121]  1.6226936979  1.4849481158  1.2995766908  1.1037405547  0.9395308956
## [126]  0.8455822347
#AIC picks ARMA(0,0) as the top model
aic5.wge(diff1, p=0:6, q=0:3)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0   17.90567
## 5    1    0   17.91105
## 2    0    1   17.91116
## 6    1    1   17.91815
## 9    2    0   17.91856
#Use Ljung-Box test to check for autocorrelations
ltest = ljung.wge(diff1, p=13, q=0)
## Obs -0.05012752 0.02088243 0.04105399 0.01482657 -0.02030161 0.05855015 -0.1750474 0.01530251 0.0246426 -0.01316719 -0.02702766 0.009380294 0.02541205 -0.2901018 0.07421219 -0.006477166 -0.01250824 -2.721028e-05 0.04033131 -0.1562574 -0.01218534 -0.0672775 -0.07826634 0.029869
#p-value < 0.05 ==> reject null, there is evidence that the residuals are serially correlated
ltest$pval
## [1] 3.660506e-06
ltest = ljung.wge(diff1, p=13, q=0, K=48)
## Obs -0.05012752 0.02088243 0.04105399 0.01482657 -0.02030161 0.05855015 -0.1750474 0.01530251 0.0246426 -0.01316719 -0.02702766 0.009380294 0.02541205 -0.2901018 0.07421219 -0.006477166 -0.01250824 -2.721028e-05 0.04033131 -0.1562574 -0.01218534 -0.0672775 -0.07826634 0.029869 -0.02541235 -0.1366727 0.05684349 -0.02859579 -0.02446095 -0.05540092 -0.003716251 0.01752697 -0.03971145 0.09442883 -0.05805845 -0.02170647 0.127178 -0.02546792 0.04242479 0.2026019 -0.02676919 0.1196677 0.06285411 0.08188191 -0.03291984 0.01956068 -0.07126766 -0.01812696
#p-value < 0.05 ==> reject null, there is evidence that the residuals are serially correlated
ltest$pval
## [1] 3.79828e-06
#forecast for the next 2 weeks
fore.ARUMA1 = fore.aruma.wge(metro_use, phi=est.temp$phi, s=7, n.ahead=14, lastn=TRUE, limits =TRUE)

ASE = mean((fore.ARUMA1$f - metro_use[(length(metro_use)-14+1):length(metro_use)])^2)
ASE
## [1] 25003365
#replot with color for clarify
plot(seq(1,273,1),metro_use, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),fore.ARUMA1$f, type = "l", col="blue")
points(seq(260,273,1),fore.ARUMA1$ll, type = "l", lty=3, col="black")
points(seq(260,273,1),fore.ARUMA1$ul, type = "l", lty=3, col="black")

#Second ARUMA model: start with ARMA(8,0)
#estimate and transform data (use mle estimates)
est.temp = est.arma.wge(diff, p=8, q=0)
## 
## Coefficients of Original polynomial:  
## 0.4393 -0.0432 -0.0507 -0.0320 0.0169 0.0039 -0.5403 0.2108 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.6390B+0.8462B^2    0.9685+-0.4938i      0.9199       0.0750
## 1-0.4515B+0.8451B^2    0.2671+-1.0545i      0.9193       0.2105
## 1+1.1355B+0.8323B^2   -0.6821+-0.8580i      0.9123       0.3569
## 1+0.9065B             -1.1032               0.9065       0.5000
## 1-0.3907B              2.5594               0.3907       0.0000
##   
## 
diff1 = artrans.wge(diff, phi.tr = est.temp$phi)

#the acf shows that two of the lags are over the 95% limits line
plotts.sample.wge(diff1, arlimits=TRUE)

## $autplt
##  [1]  1.000000000  0.005003919 -0.045886905  0.043189117  0.036317907
##  [6]  0.027130342 -0.031059072 -0.162207267  0.050826754 -0.024781320
## [11] -0.036683861  0.017854185  0.063955133 -0.047845218 -0.323561714
## [16]  0.040066797  0.042688230 -0.012224608 -0.038225095  0.017116539
## [21] -0.049411146  0.014482871 -0.046601145 -0.063998019  0.033895458
## [26] -0.053948786
## 
## $freq
##   [1] 0.003875969 0.007751938 0.011627907 0.015503876 0.019379845
##   [6] 0.023255814 0.027131783 0.031007752 0.034883721 0.038759690
##  [11] 0.042635659 0.046511628 0.050387597 0.054263566 0.058139535
##  [16] 0.062015504 0.065891473 0.069767442 0.073643411 0.077519380
##  [21] 0.081395349 0.085271318 0.089147287 0.093023256 0.096899225
##  [26] 0.100775194 0.104651163 0.108527132 0.112403101 0.116279070
##  [31] 0.120155039 0.124031008 0.127906977 0.131782946 0.135658915
##  [36] 0.139534884 0.143410853 0.147286822 0.151162791 0.155038760
##  [41] 0.158914729 0.162790698 0.166666667 0.170542636 0.174418605
##  [46] 0.178294574 0.182170543 0.186046512 0.189922481 0.193798450
##  [51] 0.197674419 0.201550388 0.205426357 0.209302326 0.213178295
##  [56] 0.217054264 0.220930233 0.224806202 0.228682171 0.232558140
##  [61] 0.236434109 0.240310078 0.244186047 0.248062016 0.251937984
##  [66] 0.255813953 0.259689922 0.263565891 0.267441860 0.271317829
##  [71] 0.275193798 0.279069767 0.282945736 0.286821705 0.290697674
##  [76] 0.294573643 0.298449612 0.302325581 0.306201550 0.310077519
##  [81] 0.313953488 0.317829457 0.321705426 0.325581395 0.329457364
##  [86] 0.333333333 0.337209302 0.341085271 0.344961240 0.348837209
##  [91] 0.352713178 0.356589147 0.360465116 0.364341085 0.368217054
##  [96] 0.372093023 0.375968992 0.379844961 0.383720930 0.387596899
## [101] 0.391472868 0.395348837 0.399224806 0.403100775 0.406976744
## [106] 0.410852713 0.414728682 0.418604651 0.422480620 0.426356589
## [111] 0.430232558 0.434108527 0.437984496 0.441860465 0.445736434
## [116] 0.449612403 0.453488372 0.457364341 0.461240310 0.465116279
## [121] 0.468992248 0.472868217 0.476744186 0.480620155 0.484496124
## [126] 0.488372093 0.492248062 0.496124031 0.500000000
## 
## $db
##   [1]  -6.83234558 -21.62019737  -4.66599220 -16.51991859   0.45393971
##   [6]   6.32507399  -0.26269886   1.12861301   2.07542048 -15.30249175
##  [11]  -6.98727957   1.98295165   5.06886769   1.65852153  -7.08556409
##  [16]   1.40851143  -4.29230420  -0.31983135   1.95629645  -3.45116236
##  [21]  -7.24479936   1.18027893  -1.54253826  -4.13207440   5.48610753
##  [26]  -1.29449073  -1.15144229  -1.84468638   3.33079907   0.21677062
##  [31]   0.51587590   1.09066053 -12.75142065  -9.58881985  -7.80365615
##  [36] -20.41845266 -29.77192775  -6.78653544 -22.64291616 -17.85606718
##  [41]  -4.23471810   0.06254853  -6.45011859   3.60121996   2.08651328
##  [46]  -0.45096611   6.56691084  -0.54401829 -19.13984882  -2.83420604
##  [51]   3.22237718  -2.21660675  -1.79744826   1.34605374  -1.45079403
##  [56] -11.51118911  -3.30552991   2.42712345  -5.89361987  -1.09975158
##  [61]   1.92438499   3.58129672   3.94923960   1.89934852   4.08922219
##  [66]   4.91494000   2.90391170   1.57207348  -4.59616803   2.77070577
##  [71] -16.89185668  -7.18321625  -6.97555599  -8.87225872 -11.59575371
##  [76]  -6.63104022  -1.61266447  -6.32735889  -3.52716075  -3.52415790
##  [81]  -4.51623204  -1.40805482   0.15489247   8.21441107  -0.66789856
##  [86]  -0.33081047 -14.62111760   1.36895049   1.52976225   0.50556887
##  [91]  -6.31657500   2.53253948 -10.58889684  -8.95479273   0.98989361
##  [96]   1.20181204  -7.28250013   8.02254380  -2.59094312  -6.67435778
## [101]   0.31196300  -3.80224939   0.84830679 -17.59427270   6.38415594
## [106]  -2.18546283  -5.09142761  -6.90583849 -20.17739964 -12.86496292
## [111] -13.54858391  -6.53554054  -3.19148956  -6.82500086  -3.05587375
## [116]  -1.57694582  -0.66822238   1.82267393  -0.46492765  -5.69894177
## [121]   0.36273352   2.84993380   3.75063779   2.97920777 -10.58580385
## [126]  -1.46403728  -0.68553427  -1.58617714  -6.32439287
## 
## $dbz
##   [1] -1.887407728 -1.567544942 -1.123182692 -0.638838386 -0.179789419
##   [6]  0.214031934  0.523568702  0.743953381  0.878973286  0.937210976
##  [11]  0.929548332  0.867710523  0.763670859  0.629786268  0.479424949
##  [16]  0.327622647  0.191082289  0.086817502  0.029176244  0.025922999
##  [21]  0.075029266  0.163904381  0.271587744  0.372801612  0.442063511
##  [26]  0.456586813  0.397689810  0.251130758  0.007030062 -0.339947606
##  [31] -0.789453759 -1.332510158 -1.945748589 -2.582373173 -3.162661540
##  [36] -3.575377532 -3.709074539 -3.511946874 -3.028744468 -2.370606219
##  [41] -1.654177967 -0.965791830 -0.357438560  0.144132796  0.527854084
##  [46]  0.791879839  0.940021726  0.980193185  0.924201968  0.788498846
##  [51]  0.595567248  0.375404747  0.165934502  0.010427862 -0.049866328
##  [56]  0.012832153  0.202513013  0.497078260  0.855915517  1.232033342
##  [61]  1.581976309  1.870679499  2.072221681  2.168559439  2.147797330
##  [66]  2.002780599  1.730339118  1.331381627  0.812137125  0.187074230
##  [71] -0.515781355 -1.246896343 -1.927234852 -2.449899166 -2.707306335
##  [76] -2.643708081 -2.291211774 -1.748432981 -1.126814189 -0.514340735
##  [81]  0.032060256  0.480171234  0.814422510  1.030324841  1.131638111
##  [86]  1.129507516  1.042705608  0.898016053  0.729477196  0.574979445
##  [91]  0.469402379  0.435761784  0.478564612  0.583546771  0.723969716
##  [96]  0.869440890  0.992855287  1.073682541  1.098212513  1.058157916
## [101]  0.948771363  0.767184896  0.511373583  0.180006137 -0.226581784
## [106] -0.704014633 -1.239789113 -1.807389567 -2.359761141 -2.826613473
## [111] -3.124521351 -3.186306690 -2.996604647 -2.602109107 -2.085697447
## [116] -1.530635147 -1.000420718 -0.535857538 -0.159908931  0.116610923
## [121]  0.291031497  0.366769970  0.352814055  0.264204310  0.122847707
## [126] -0.042324489 -0.197224986 -0.307740806 -0.347855398
#AIC picks ARMA(5,2) as the top model
aic5.wge(diff1, p=0:6, q=0:3)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 23    5    2   17.91568
## 27    6    2   17.92884
## 1     0    0   17.93351
## 2     0    1   17.94124
## 5     1    0   17.94124
#estimate and transform data one more time
est.temp2 =est.arma.wge(diff1, p=5, q=2)
## 
## Coefficients of Original polynomial:  
## -0.3765 -0.9319 0.0370 -0.0173 0.1458 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1+0.2280B+0.8288B^2   -0.1375+-1.0898i      0.9104       0.2700
## 1+0.6262B+0.3683B^2   -0.8502+-1.4116i      0.6069       0.3363
## 1-0.4776B              2.0937               0.4776       0.0000
##   
## 
#the acf shows that only one lag is over the 95% limits line
plotts.sample.wge(est.temp2$res, arlimits=TRUE)

## $autplt
##  [1]  1.0000000000 -0.0056924382  0.0048677237 -0.0008197268  0.0390925019
##  [6] -0.0792106991  0.0211174205 -0.0934521674 -0.0317067534 -0.0689593210
## [11]  0.0662339107  0.0577436687 -0.0265994144 -0.0480287053 -0.2518191218
## [16]  0.0229370477 -0.0363653935  0.0515787038  0.0071977246 -0.0060641238
## [21] -0.0841454907  0.0465454418 -0.0234511332 -0.0786053773  0.0101657861
## [26] -0.0185984238
## 
## $freq
##   [1] 0.003875969 0.007751938 0.011627907 0.015503876 0.019379845
##   [6] 0.023255814 0.027131783 0.031007752 0.034883721 0.038759690
##  [11] 0.042635659 0.046511628 0.050387597 0.054263566 0.058139535
##  [16] 0.062015504 0.065891473 0.069767442 0.073643411 0.077519380
##  [21] 0.081395349 0.085271318 0.089147287 0.093023256 0.096899225
##  [26] 0.100775194 0.104651163 0.108527132 0.112403101 0.116279070
##  [31] 0.120155039 0.124031008 0.127906977 0.131782946 0.135658915
##  [36] 0.139534884 0.143410853 0.147286822 0.151162791 0.155038760
##  [41] 0.158914729 0.162790698 0.166666667 0.170542636 0.174418605
##  [46] 0.178294574 0.182170543 0.186046512 0.189922481 0.193798450
##  [51] 0.197674419 0.201550388 0.205426357 0.209302326 0.213178295
##  [56] 0.217054264 0.220930233 0.224806202 0.228682171 0.232558140
##  [61] 0.236434109 0.240310078 0.244186047 0.248062016 0.251937984
##  [66] 0.255813953 0.259689922 0.263565891 0.267441860 0.271317829
##  [71] 0.275193798 0.279069767 0.282945736 0.286821705 0.290697674
##  [76] 0.294573643 0.298449612 0.302325581 0.306201550 0.310077519
##  [81] 0.313953488 0.317829457 0.321705426 0.325581395 0.329457364
##  [86] 0.333333333 0.337209302 0.341085271 0.344961240 0.348837209
##  [91] 0.352713178 0.356589147 0.360465116 0.364341085 0.368217054
##  [96] 0.372093023 0.375968992 0.379844961 0.383720930 0.387596899
## [101] 0.391472868 0.395348837 0.399224806 0.403100775 0.406976744
## [106] 0.410852713 0.414728682 0.418604651 0.422480620 0.426356589
## [111] 0.430232558 0.434108527 0.437984496 0.441860465 0.445736434
## [116] 0.449612403 0.453488372 0.457364341 0.461240310 0.465116279
## [121] 0.468992248 0.472868217 0.476744186 0.480620155 0.484496124
## [126] 0.488372093 0.492248062 0.496124031 0.500000000
## 
## $db
##   [1]  -7.372574420 -22.344768901  -5.154051143 -17.108342180   0.002146225
##   [6]   5.924219237  -0.627374901   0.811987568   1.802012601 -15.666653270
##  [11]  -7.226147560   1.858824501   4.970551501   1.645788383  -7.032422649
##  [16]   1.483813794  -4.126902325  -0.113966255   2.194253069  -3.130100787
##  [21]  -6.907132947   1.583350905  -1.085851788  -3.627189320   6.032403397
##  [26]  -0.684917538  -0.487077522  -1.203812994   4.045616778   0.922698827
##  [31]   1.257782844   1.884996300 -11.955679318  -8.894971077  -6.931369674
##  [36] -19.416089439 -28.044373513  -6.032232203 -22.408988993 -16.848620961
##  [41]  -3.567092806   0.708154018  -5.841308295   4.131912210   2.567983175
##  [46]  -0.075591625   6.877472054  -0.378299339 -18.416059890  -2.819737946
##  [51]   3.046319849  -2.513756187  -2.083253502   0.816790244  -2.014254564
##  [56] -12.847940193  -4.517889124   1.046481916  -7.368340072  -2.789606242
##  [61]   0.146782678   1.299129372   1.426188691  -0.844919155   1.087634409
##  [66]   1.788359982   0.013197363  -1.465103241  -7.277737906   3.593194832
##  [71] -10.717178199   0.177004952   4.994635860   0.215383918  -4.847308903
##  [76]  -0.855492666   2.129051909  -3.460374801  -1.257462634  -2.412830908
##  [81]  -4.495929659  -1.545091800   0.054312883   7.654581266  -1.415961243
##  [86]  -1.267540709 -16.608912701   0.153092590   0.387119399  -0.973689979
##  [91]  -7.886232762   1.069157803 -11.531270613 -10.720926815  -0.174089658
##  [96]   0.089755343  -8.244995762   6.927204068  -3.431287243  -7.631170331
## [101]  -0.446955883  -4.588262293   0.409155309 -18.869683404   6.110346872
## [106]  -2.242293239  -5.276641585  -6.799784845 -19.856574173 -12.544318764
## [111] -13.120200715  -6.095808059  -2.499062625  -6.083080450  -2.381475554
## [116]  -0.778484196   0.313843881   2.783064832   0.578009104  -4.504003603
## [121]   1.481419246   4.069480143   4.987101430   4.284156285  -9.099397220
## [126]  -0.067510230   0.598976919  -0.262250327  -5.096189735
## 
## $dbz
##   [1] -2.28426941 -1.95743863 -1.50243120 -1.00439628 -0.52879641
##   [6] -0.11541549  0.21696839  0.46373721  0.62880261  0.72071356
##  [11]  0.75019427  0.72878734  0.66839760  0.58154726  0.48201889
##  [16]  0.38534254  0.30841777  0.26767035  0.27573354  0.33763895
##  [21]  0.44826472  0.59247288  0.74794483  0.88931905  0.99193937
##  [26]  1.03425354  0.99882141  0.87242526  0.64590830  0.31434142
##  [31] -0.12185971 -0.65409453 -1.26008132 -1.89524529 -2.48347727
##  [36] -2.91758052 -3.08735169 -2.93629245 -2.50074530 -1.88574532
##  [41] -1.20659249 -0.55122856  0.02596722  0.49604198  0.84604989
##  [46]  1.07244507  1.17730296  1.16652677  1.04934937  0.83871436
##  [51]  0.55226711  0.21363562 -0.14662333 -0.49155897 -0.78135135
##  [56] -0.98062223 -1.06779474 -1.04178345 -0.92152206 -0.73844584
##  [61] -0.52655412 -0.31474923 -0.12314943  0.03736781  0.16376524
##  [66]  0.25858751  0.32739754  0.37629107  0.40991624  0.43045732
##  [71]  0.43787374  0.43131112  0.41118680  0.38117934  0.34927474
##  [76]  0.32713083  0.32740485  0.35949322  0.42519182  0.51632963
##  [81]  0.61567091  0.70065865  0.74829314  0.73950811  0.66236241
##  [86]  0.51422033  0.30332585  0.04977781 -0.21492313 -0.45141669
##  [91] -0.62053188 -0.69466181 -0.66725107 -0.55430704 -0.38701231
##  [96] -0.20075609 -0.02666281  0.11206623  0.20002108  0.22762875
## [101]  0.18888207  0.07958951 -0.10345062 -0.36214491 -0.69550120
## [106] -1.09651192 -1.54732994 -2.01308501 -2.43668327 -2.74096741
## [111] -2.84715393 -2.70892448 -2.33910243 -1.80280683 -1.18445855
## [116] -0.55835133  0.02280634  0.52651498  0.93500132  1.24028749
## [121]  1.44107859  1.54135659  1.55024649  1.48267654  1.36023005
## [126]  1.21126969  1.06905359  0.96671637  0.92943935
#AIC picks ARMA(0,0) as the top model
aic5.wge(est.temp2$res, p=0:6, q=0:3)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0   17.86141
## 5    1    0   17.86913
## 2    0    1   17.86913
## 9    2    0   17.87686
## 3    0    2   17.87686
#Use Ljung-Box test to check for autocorrelations
ltest = ljung.wge(est.temp2$res, p=5, q=2)
## Obs -0.005692438 0.004867724 -0.0008197268 0.0390925 -0.0792107 0.02111742 -0.09345217 -0.03170675 -0.06895932 0.06623391 0.05774367 -0.02659941 -0.04802871 -0.2518191 0.02293705 -0.03636539 0.0515787 0.007197725 -0.006064124 -0.08414549 0.04654544 -0.02345113 -0.07860538 0.01016579
#p-value > 0.05 ==> fail to reject null, there is not enough evidence that the residuals are serially correlated 
ltest$pval
## [1] 0.01392211
ltest = ljung.wge(est.temp2$res, p=5, q=2, K=48)
## Obs -0.005692438 0.004867724 -0.0008197268 0.0390925 -0.0792107 0.02111742 -0.09345217 -0.03170675 -0.06895932 0.06623391 0.05774367 -0.02659941 -0.04802871 -0.2518191 0.02293705 -0.03636539 0.0515787 0.007197725 -0.006064124 -0.08414549 0.04654544 -0.02345113 -0.07860538 0.01016579 -0.01859842 -0.1377344 0.02618051 -0.02079836 -0.009630042 -0.08153746 -0.01428643 0.03538718 -0.02604908 0.06916702 -0.04530782 -0.0271525 0.09198383 -0.006977457 0.0775433 0.1630607 -0.01512161 0.1184179 0.05821979 0.07877787 -0.03876791 -0.006666138 -0.07478755 -0.03483509
#p-value < 0.05 ==> reject null, there is evidence that the residuals are serially correlated
ltest$pval
## [1] 0.00837519
#multiple the two phi factors from the two estimates for forecasting
phi_comb = mult.wge(est.temp$phi, est.temp2$phi)

#forecast for the next 2 weeks
fore.ARUMA2 = fore.aruma.wge(metro_use, phi=phi_comb$model.coef, theta=est.temp2$theta, s=7, n.ahead=14, lastn=TRUE, limits =TRUE)

ASE = mean((fore.ARUMA2$f - metro_use[(length(metro_use)-14+1):length(metro_use)])^2)
ASE
## [1] 22680059
#replot with color for clarify
plot(seq(1,273,1),metro_use, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),fore.ARUMA2$f, type = "l", col="blue")
points(seq(260,273,1),fore.ARUMA2$ll, type = "l", lty=3, col="black")
points(seq(260,273,1),fore.ARUMA2$ul, type = "l", lty=3, col="black")

VAR Models

This is a multivariate approach that uses more than one time-series variable to better understand the data.

#plot realization for all the variables
plotts.wge(metro_2018$tot_traffic_vol)

plotts.wge(metro_2018$avg_temp)

plotts.wge(metro_2018$avg_rain)

plotts.wge(metro_2018$avg_clouds)

#check for lagged variables ==> doesn't seem that there are any
ccf(metro_2018$tot_traffic_vol,metro_2018$avg_temp, ylim = c(-1,1))

ccf(metro_2018$tot_traffic_vol,metro_2018$avg_rain, ylim = c(-1,1))

ccf(metro_2018$tot_traffic_vol,metro_2018$avg_clouds, ylim = c(-1,1))

#create subsets to exclude the last 14 obs for prediction
traffic_vol14 = metro_2018$tot_traffic_vol[1:259]
avg_temp14 = metro_2018$avg_temp[1:259]
avg_rain14 = metro_2018$avg_rain[1:259]
avg_clouds14 = metro_2018$avg_clouds[1:259]

#VAR_1 - with temp as the regressor
X=cbind(traffic_vol14,avg_temp14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      3      1      5 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 1.999413e+01 2.001621e+01 1.993501e+01 1.995247e+01 1.991473e+01
## HQ(n)  2.009648e+01 2.014130e+01 2.008285e+01 2.012305e+01 2.010805e+01
## SC(n)  2.024841e+01 2.032699e+01 2.030230e+01 2.037626e+01 2.039502e+01
## FPE(n) 4.823584e+08 4.931505e+08 4.547245e+08 4.627796e+08 4.456975e+08
##                   6            7            8            9           10
## AIC(n) 1.993561e+01 1.991920e+01 1.992054e+01 1.993812e+01 1.993403e+01
## HQ(n)  2.015168e+01 2.015801e+01 2.018210e+01 2.022243e+01 2.024108e+01
## SC(n)  2.047241e+01 2.051250e+01 2.057035e+01 2.064444e+01 2.069685e+01
## FPE(n) 4.551796e+08 4.478634e+08 4.485791e+08 4.566735e+08 4.549698e+08
#VARselect picks p=5 (using AIC)
lsfit=VAR(X, p=5, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR1 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR1 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 7691108
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR1, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_2 - with rain as the regressor
X=cbind(traffic_vol14,avg_rain14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      6      3      3      6 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 1.417869e+01 1.412357e+01 1.406083e+01 1.406556e+01 1.407775e+01
## HQ(n)  1.428104e+01 1.424866e+01 1.420867e+01 1.423614e+01 1.427108e+01
## SC(n)  1.443297e+01 1.443435e+01 1.442812e+01 1.448935e+01 1.455805e+01
## FPE(n) 1.437987e+06 1.360938e+06 1.278278e+06 1.284462e+06 1.300396e+06
##                   6            7            8            9           10
## AIC(n) 1.400121e+01 1.400375e+01 1.402980e+01 1.405990e+01 1.406438e+01
## HQ(n)  1.421728e+01 1.424257e+01 1.429135e+01 1.434421e+01 1.437143e+01
## SC(n)  1.453801e+01 1.459706e+01 1.467961e+01 1.476622e+01 1.482720e+01
## FPE(n) 1.204773e+06 1.208095e+06 1.240285e+06 1.278578e+06 1.284776e+06
#VARselect picks p=6
lsfit=VAR(X, p=6, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR2 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR2 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 9437462
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR2, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_3 - with clouds as the regressor
X=cbind(traffic_vol14,avg_clouds14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      1      1      1      1 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 2.383189e+01 2.383483e+01 2.383663e+01 2.386532e+01 2.389606e+01
## HQ(n)  2.393424e+01 2.395992e+01 2.398447e+01 2.403591e+01 2.408939e+01
## SC(n)  2.408617e+01 2.414561e+01 2.420392e+01 2.428911e+01 2.437636e+01
## FPE(n) 2.239166e+10 2.245871e+10 2.250094e+10 2.315823e+10 2.388437e+10
##                   6            7            8            9           10
## AIC(n) 2.392710e+01 2.394216e+01 2.396987e+01 2.400060e+01 2.399705e+01
## HQ(n)  2.414317e+01 2.418097e+01 2.423143e+01 2.428490e+01 2.430410e+01
## SC(n)  2.446390e+01 2.453546e+01 2.461968e+01 2.470692e+01 2.475987e+01
## FPE(n) 2.464147e+10 2.502053e+10 2.573018e+10 2.654105e+10 2.645642e+10
#VARselect picks p=1
lsfit=VAR(X, p=1, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR3 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR3 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 9306680
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR3, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_4 - with temp & rain as the regressors
X=cbind(traffic_vol14, avg_temp14, avg_rain14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      7      3      1      7 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 1.653628e+01 1.651674e+01 1.637701e+01 1.639915e+01 1.637648e+01
## HQ(n)  1.670686e+01 1.673850e+01 1.664994e+01 1.672326e+01 1.675176e+01
## SC(n)  1.696007e+01 1.706767e+01 1.705508e+01 1.720435e+01 1.730882e+01
## FPE(n) 1.519399e+07 1.490231e+07 1.296217e+07 1.325712e+07 1.296633e+07
##                   6            7            8            9           10
## AIC(n) 1.632410e+01 1.631962e+01 1.634794e+01 1.640211e+01 1.642711e+01
## HQ(n)  1.675055e+01 1.679725e+01 1.687674e+01 1.698209e+01 1.705827e+01
## SC(n)  1.738357e+01 1.750623e+01 1.766168e+01 1.784299e+01 1.799513e+01
## FPE(n) 1.231262e+07 1.226778e+07 1.263312e+07 1.335306e+07 1.371172e+07
#VARselect picks p=7
lsfit=VAR(X, p=7, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR4 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR4 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 6671298
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR4, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_5 - with temp & clouds as the regressors
X=cbind(traffic_vol14, avg_temp14, avg_clouds14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 2.616440e+01 2.620126e+01 2.612961e+01 2.615738e+01 2.613733e+01
## HQ(n)  2.633498e+01 2.642301e+01 2.640254e+01 2.648148e+01 2.651261e+01
## SC(n)  2.658819e+01 2.675218e+01 2.680768e+01 2.696258e+01 2.706967e+01
## FPE(n) 2.307342e+11 2.394336e+11 2.229346e+11 2.292931e+11 2.248539e+11
##                   6            7            8            9           10
## AIC(n) 2.617512e+01 2.618389e+01 2.621019e+01 2.626822e+01 2.629365e+01
## HQ(n)  2.660157e+01 2.666152e+01 2.673900e+01 2.684820e+01 2.692480e+01
## SC(n)  2.723459e+01 2.737050e+01 2.752394e+01 2.770910e+01 2.786167e+01
## FPE(n) 2.336651e+11 2.359199e+11 2.424571e+11 2.572643e+11 2.642858e+11
#VARselect picks p=3
lsfit=VAR(X, p=3, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR5 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR5 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 9010104
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR5, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_6 - with rain & clouds as the regressors
X=cbind(traffic_vol14, avg_rain14, avg_clouds14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 2.036118e+01 2.031603e+01 2.026085e+01 2.029873e+01 2.034335e+01
## HQ(n)  2.053176e+01 2.053778e+01 2.053378e+01 2.062284e+01 2.071863e+01
## SC(n)  2.078497e+01 2.086695e+01 2.093891e+01 2.110393e+01 2.127569e+01
## FPE(n) 6.963104e+08 6.656738e+08 6.300945e+08 6.546599e+08 6.848667e+08
##                   6            7            8            9           10
## AIC(n) 2.028717e+01 2.031098e+01 2.036542e+01 2.042795e+01 2.044497e+01
## HQ(n)  2.071363e+01 2.078861e+01 2.089423e+01 2.100793e+01 2.107612e+01
## SC(n)  2.134665e+01 2.149759e+01 2.167917e+01 2.186884e+01 2.201299e+01
## FPE(n) 6.478759e+08 6.640339e+08 7.019129e+08 7.481402e+08 7.621210e+08
#VARselect picks p=3
lsfit=VAR(X, p=3, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR6 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR6 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 10177968
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR6, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

#VAR_7 - with temp, rain & clouds as the regressors
X=cbind(traffic_vol14, avg_temp14, avg_rain14, avg_clouds14)
VARselect(X, lag.max = 10, type = "const",season = 7, exogen = NULL)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      3      1      1      3 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 2.270035e+01 2.270413e+01 2.258715e+01 2.263262e+01 2.263494e+01
## HQ(n)  2.295054e+01 2.304529e+01 2.301930e+01 2.315574e+01 2.324903e+01
## SC(n)  2.332191e+01 2.355170e+01 2.366075e+01 2.393224e+01 2.416058e+01
## FPE(n) 7.223306e+09 7.253211e+09 6.456439e+09 6.762992e+09 6.787575e+09
##                   6            7            8            9           10
## AIC(n) 2.259681e+01 2.263022e+01 2.268608e+01 2.279122e+01 2.284226e+01
## HQ(n)  2.330189e+01 2.342627e+01 2.357311e+01 2.376923e+01 2.391124e+01
## SC(n)  2.434848e+01 2.460790e+01 2.488979e+01 2.522095e+01 2.549801e+01
## FPE(n) 6.545262e+09 6.783168e+09 7.193760e+09 8.020017e+09 8.476720e+09
#VARselect picks p=3
lsfit=VAR(X, p=3, type='const', season = 7, exogen = NULL)

#forecast for the next 2 weeks
preds=predict(lsfit, n.ahead=14)
forecast_VAR7 = preds$fcst$traffic_vol14[1:14,1]

ASE = mean((forecast_VAR7 - metro_2018$tot_traffic_vol[(length(metro_2018$tot_traffic_vol)-14+1):length(metro_2018$tot_traffic_vol)])^2)
ASE
## [1] 8979143
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l", ylim=c(25000,110000))
points(seq(260,273,1),forecast_VAR7, type = "l", col="blue")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,2], type = "l", lty=3, col="black")
points(seq(260,273,1),preds$fcst$traffic_vol14[1:14,3], type = "l", lty=3, col="black")

Neural Network Models

These models focus on analyzing training samples and utilizing that learning to predict test data. This section starts with defining the train and test time series objects for traffic volume.

#define train and test time series objects
trafficTrain = ts(metro_2018$tot_traffic_vol[1:259], start = 1, frequency = 1)
trafficTest = ts(metro_2018$tot_traffic_vol[260:273], start = 260, frequency = 1)

#NN_1 - no regressor
set.seed(3)
fit.mlp = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean")
fit.mlp
## MLP fit with 3 hidden nodes and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1)
## Forecast combined using the mean operator.
## MSE: 89696490.6779.
plot(fit.mlp)

#forecast for the next 2 weeks
fore.mlp.NN1=forecast(fit.mlp, h=14)
ASE = mean((trafficTest-fore.mlp.NN1$mean)^2)
ASE
## [1] 8176842
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN1$mean, type = "l", col="blue")

#NN_2 - with temp as the regressor
ts_reg = data.frame(temp_df = ts(metro_2018$avg_temp))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 1 hidden node and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1,4)
## 1 regressor included.
## - Regressor 1 lags: (2,3,4)
## Forecast combined using the mean operator.
## MSE: 84644421.1814.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN2=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN2$mean)^2)
ASE
## [1] 16675188
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN2$mean, type = "l", col="blue")

#NN_3 - with rain as the regressor
ts_reg = data.frame(rain_df = ts(metro_2018$avg_rain))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 3 hidden nodes and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1)
## Forecast combined using the mean operator.
## MSE: 89696490.6779.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN3=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN3$mean)^2)
ASE
## [1] 8176842
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN3$mean, type = "l", col="blue")

#NN_4 - with clouds as the regressor
ts_reg = data.frame(clouds_df = ts(metro_2018$avg_clouds))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 1 hidden node and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1)
## 1 regressor included.
## - Regressor 1 lags: (1,2,3)
## Forecast combined using the mean operator.
## MSE: 88172718.1723.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN4=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN4$mean)^2)
ASE
## [1] 9444744
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN4$mean, type = "l", col="blue")

#NN_5 - with temp & rain
ts_reg = data.frame(temp_df = ts(metro_2018$avg_temp), rain_df = ts(metro_2018$avg_rain))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 1 hidden node and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1,4)
## 1 regressor included.
## - Regressor 1 lags: (2,3,4)
## Forecast combined using the mean operator.
## MSE: 84644421.1814.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN5=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN5$mean)^2)
ASE
## [1] 16675188
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN5$mean, type = "l", col="blue")

#NN_6 - with temp & clouds as the regressors
ts_reg = data.frame(temp_df = ts(metro_2018$avg_temp), clouds_df = ts(metro_2018$avg_clouds))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 1 hidden node and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1,4)
## 2 regressors included.
## - Regressor 1 lags: (2,3,4)
## - Regressor 2 lags: (1,2,3)
## Forecast combined using the mean operator.
## MSE: 82157496.6644.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN6=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN6$mean)^2)
ASE
## [1] 23358223
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN6$mean, type = "l", col="blue")

#NN_7 - with temp, rain & clouds as the regressors
ts_reg = data.frame(temp_df = ts(metro_2018$avg_temp), rain_df = ts(metro_2018$avg_rain), clouds_df = ts(metro_2018$avg_clouds))
set.seed(3)
fit.mlp.reg = mlp(trafficTrain, difforder=7, hd.auto.type = "cv", comb = "mean", xreg = ts_reg)
fit.mlp.reg
## MLP fit with 1 hidden node and 20 repetitions.
## Series modelled in differences: D7.
## Univariate lags: (1,4)
## 2 regressors included.
## - Regressor 1 lags: (2,3,4)
## - Regressor 2 lags: (1,2,3)
## Forecast combined using the mean operator.
## MSE: 82157496.6644.
plot(fit.mlp.reg)

#forecast for the next 2 weeks
fore.mlp.NN7=forecast(fit.mlp.reg, h=14, xreg = ts_reg)
ASE = mean((trafficTest-fore.mlp.NN7$mean)^2)
ASE
## [1] 23358223
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),fore.mlp.NN7$mean, type = "l", col="blue")

Ensemble Models

#ensemble of VAR_1 and NN_1 models
ensemble_1 = (forecast_VAR1 + fore.mlp.NN1$mean)/2

ASE = mean((trafficTest-ensemble_1)^2)
ASE
## [1] 6232211
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),ensemble_1, type = "l", col="blue")

#ensemble of VAR_1 and NN_4 models
ensemble_2 = (forecast_VAR1 + fore.mlp.NN4$mean)/2

ASE = mean((trafficTest-ensemble_2)^2)
ASE
## [1] 7035706
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),ensemble_2, type = "l", col="blue")

#ensemble of VAR_4 and NN_1 models
ensemble_3 = (forecast_VAR4 + fore.mlp.NN1$mean)/2

ASE = mean((trafficTest-ensemble_3)^2)
ASE
## [1] 5627482
#plot the forecasts
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),ensemble_3, type = "l", col="blue")

#ensemble of VAR_4 and NN_4 models
ensemble_4 = (forecast_VAR4 + fore.mlp.NN4$mean)/2

ASE = mean((trafficTest-ensemble_4)^2)
ASE
## [1] 6301341
#plot the forecasts 
plot(seq(1,273,1),metro_2018$tot_traffic_vol, type = "l")
points(seq(260,273,1),ensemble_4, type = "l", col="blue")

Model Comparison

The below tables list the various models tried on the dataset with their respective ASE and/or AIC values (if available). The average square error (ASE) takes the sum of the forecasts minus the actual value squared and divides that by the number of forecasted values. The AIC (Akaike’s Information Criterion) is a relative measure of model parsimony. In both cases, the smaller/lower value provides evidence to suggest a model with better fit for the data and a more parsimonious model.

Based on the models performed, the ARUMA model has evidence to suggest that it is not a good fit for the data. On the other hand, the VAR model with average daily temperature and average daily rain as the regressors has evidence to suggest that it may be a good fit for the data. The neural network models with regressors are not a great fit, with higher ASE values. The ensemble models provide various options that could be a good fit for this data. The model chosen does not need to necessarily have the lowest ASE value, however the benefit of these models are that they are a combination of previous models. There are various models in a close range that should also be considered when utilizing the data in decision-making.

Summary Table for ARUMA Models
Model Description ASE
ARUMA_1 (13,0,0) with s=7 25,003,365
ARUMA_2 (13,0,2) with s=7 - First fit (8,0) then fit (5,2) on the residuals 22,680,059
Summary Table for VAR Models
Model Description ASE AIC
VAR_1 regressor: temp 7,691,108 1.991473e+01
VAR_2 regressor: rain 9,437,462 1.400121e+01
VAR_3 regressor: clouds 9,306,680 2.383189e+01
VAR_4 regressor: temp & rain 6,671,298 1.631962e+01
VAR_5 regressor: temp & clouds 9,010,104 2.612961e+01
VAR_6 regressor: rain & clouds 10,177,968 2.026085e+01
VAR_7 regressor: temp & rain & clouds 8,979,143 2.258715e+01
Summary Table for Neural Network Models
Model Description ASE Note
NN_1 no regressor 8,176,842 -
NN_2 regressor: temp 16,675,188 -
NN_3 regressor: rain 8,176,842 mlp dropped rain
NN_4 regressor: clouds 9,444,744 -
NN_5 regressor: temp & rain 16,675,188 mlp dropped rain
NN_6 regressor: temp & clouds 23,358,223 -
NN_7 regressor: temp & rain & clouds 23,358,223 mlp dropped rain
Summary Table for Ensemble Models
Model Description ASE Regressor(s)
Ensemble_1 VAR_1 & NN_1 6,232,211 temp for VAR and no regressor for NN
Ensemble_2 VAR_1 & NN_4 7,035,706 temp for VAR and clouds for NN
Ensemble_3 VAR_4 & NN_1 5,627,482 temp & rain for VAR and no regressor for NN
Ensemble_4 VAR_4 & NN_4 6,301,341 temp & rain for VAR and clouds for NN

Conclusion

Based on the analysis, we can conclude that for this particular time series, ARUMA models do not perform well in forecasting the average traffic volume of the day. Just using the time as the predictor, the neural network seems to do a much better job with forecasting for a period of 14 days.

With VAR, the temperature and rain information are useful in predicting the traffic volume, giving more accurate forecasts. It makes sense that the intense weather conditions can have an impact on the traffic on the roads. Overall, the VAR models perform better than the ARUMA and neural network models.

Neural network models perform best with no regressor. The second highest-performing model includes clouds as the regressor. The neural network models do not seem to think rain is a useful regressor in forecasting the traffic volume. We think that may be due to there not being many days where it rained. There is only about a month worth of data where the average rain is not 0, from August 24 to September 30. If we have more data points, the model might think that it is useful.

Overall, ensemble models perform quite well. The ensemble model combining the top performing VAR and neural network models results in the lowest ASE of all models. We believe that each method has its own pros and cons when it comes to forecasting. One may over-predict and the other may under-predict, and the ensemble model is able to take advantages of that, thus producing the best performing model with the highest prediction power. However, it is computationally most expensive among the rest.

For the task at hand, we have chosen to base our judgment off the highest performing model for this traffic volume application, the ensemble that combined the VAR 4 and NN 1 models. The forecast shows a steady rate of traffic over the next 14 days, that seems to follow what has been seen in the recent past. Based on this information, there is sufficient evidence to support lane expansion on the I-94 West highway from Minneapolis to St. Paul.

Neural network models generally take a while to run. Thus, if the computational cost is not a problem, one should choose the ensemble model over the remaining models. Otherwise, one should choose the VAR model which can spit out the results within seconds, if the accuracy is not a high priority. Also, the results may change if we use a different time frame/samples or change the forecast horizon.

Future works may include using data from a different time frame to validate the results and exploring the hyper-parameters of the mlp function to further optimize performance for the neural network models.